Performance Comparison of K-Nearest Neighbor, Decision Tree, and Support Vector Machine Algorithms for Diabetes Classification
##plugins.themes.bootstrap3.article.main##
Abstract
This paper investigates the performance of three supervised machine learning algorithms K-Nearest Neighbor (KNN), Decision Tree (DT), and Support Vector Machine (SVM) for diabetes classification using the Pima Indians Diabetes Dataset. The study aims to provide a fair and consistent comparison by applying unified preprocessing procedures, including median imputation for clinically invalid values, feature standardization, and stratified 5-fold cross-validation. Model performance is evaluated using accuracy, precision, recall, and F1-score, with particular emphasis on recall for the diabetic class due to its clinical significance in reducing false negative diagnoses. Experimental results show that the Decision Tree model achieves the most balanced performance, with an average accuracy of 0.78 and an F1-score of 0.75, while maintaining higher recall for diabetic cases compared to KNN and SVM. Although SVM and KNN demonstrate acceptable overall accuracy, both models exhibit limitations in identifying minority-class instances. These findings highlight the importance of algorithm selection based not only on accuracy but also on clinical priorities such as interpretability and sensitivity to positive cases. The study contributes practical insights for the development of reliable machine learning–based decision support systems for early diabetes screening.
##plugins.themes.bootstrap3.article.details##

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish articles in CoreID Journal agree to the following terms:
- Authors retain copyright of the article and grant the journal right of first publication with the work simultaneously licensed under a CC-BY-SA or The Creative Commons Attribution–ShareAlike License.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
W. H. Organizaton, “Diabetes,” World Health Organization, 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/diabetes.
Y. Chen, G. Wang, Z. Hou, X. Liu, S. Ma, and M. Jiang, “Comparative diabetes mellitus burden trends across global, Chinese, US, and Indian populations using GBD 2021 database,” Nat. Brief., 2025.
Q. Saihood and E. Sonuc, “A practical framework for early detection of diabetes using ensemble machine learning models,” Turkish J. Electr. Eng. Comput. Sci., vol. 31, no. 4, pp. 722–738, Jul. 2023.
M. Bansal, A. Goyal, and A. Choudhary, “A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning,” Decis. Anal. J., vol. 3, p. 100071, Jun. 2022.
A. R. Isnain, J. Supriyanto, and M. P. Kharisma, “Implementation of K-Nearest Neighbor (K-NN) Algorithm For Public Sentiment Analysis of Online Learning,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 15, no. 2, p. 121, Apr. 2021.
W. B. Zulfikar and N. Lukman, “Comparison of Naïve Bayes Classifier and Nearest Neighbor for Eye Disease Identification,” J. Online Inform., vol. 1, no. 2, Dec. 2016.
F. Muzaki, C. N. Alam, and M. Irfan, “Implementasi Algoritma Dijkstra untuk Rute Terdekat dan Estimasi Biaya Perjalanan Dinas (Studi Kasus Ptkis Kopertais Ii Jawa Barat Dan Banten),” vol. 1, no. 2, pp. 212–216, 2018.
W. B. Zulfikar, M. Irfan, C. N. Alam, and M. Indra, “The comparation of text mining with Naive Bayes classifier, nearest neighbor, and decision tree to detect Indonesian swear words on Twitter,” in 2017 5th International Conference on Cyber and IT Service Management, CITSM 2017, 2017.
M. Irfan, N. Lukman, A. A. Alfauzi, and J. Jumadi, “Comparison of algorithm Support Vector Machine and C4.5 for identification of pests and diseases in chili plants,” J. Phys. Conf. Ser., vol. 1402, no. 6, p. 066104, Dec. 2019.
A. M. Roofiad, C. N. Alam, and A. R. Atmadja, “Klasifikasi Tulisan Tangan Huruf Hijaiyah Anak Usia 6-8 Tahun Menggunakan Metode Support Vector Machine,” SENTRI J. Ris. Ilm., vol. 4, no. 12, pp. 3762–3769, Dec. 2025.
Y. A. Gerhana, A. R. Atmadja, W. B. Zulfikar, and N. Ashanti, “The implementation of K-nearest neighbor algorithm in case-based reasoning model for forming automatic answer identity and searching answer similarity of algorithm case,” in 2017 5th International Conference on Cyber and IT Service Management (CITSM), 2017, pp. 1–5.
A. R. Atmadja, W. Uriawan, F. Pritisen, D. S. Maylawati, and A. Arbain, “Comparison of Naive Bayes and K-nearest neighbours for online transportation using sentiment analysis in social media,” J. Phys. Conf. Ser., vol. 1402, no. 7, p. 077029, Dec. 2019.
B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021.
I. D. Mienye and N. Jere, “A Survey of Decision Trees: Concepts, Algorithms, and Applications,” IEEE Access, vol. 12, pp. 86716–86727, 2024.
E. B. Rahayu, “Algoritma C4 . 5 Untuk Penjurusan Siswa SMA NEGERI 3 PATI,” Progr. Stud. Tek. Inform. Fak. Ilmu Komput., pp. 3–6, 2014.
I. Shafi et al., “An Effective Method for Lung Cancer Diagnosis from CT Scan Using Deep Learning-Based Support Vector Network,” Cancers (Basel)., vol. 14, no. 21, p. 5457, Nov. 2022.
M. Irfan, N. Lukman, A. A. Alfauzi, and J. Jumadi, “Comparison of algorithm Support Vector Machine and C4.5 for identification of pests and diseases in chili plants,” in Journal of Physics: Conference Series, 2019, vol. 1402, no. 6.
U. Syaripudin, D. Suparman, Y. A. Gerhana, A. P. Rahayu, M. Mintarsih, and R. Alawiyah, “Chatbot for Signaling Quranic Verses Science Using Support Vector Machine Algorithm,” J. Online Inform., vol. 6, no. 2, pp. 225–232, Dec. 2021.
S. Muawanah, U. Muzayanah, M. G. R. Pandin, M. D. S. Alam, and Trisnaningtyas Januari P. N., “Stress and Coping Strategies of Madrasah’s Teachers on Applying Distance Learning During COVID-19 Pandemic in Indonesia,” Qubahan Acad. J., vol. 3, no. 4, 2023.
A. Razaque, M. Ben Haj Frej, M. Almi’ani, M. Alotaibi, and B. Alotaibi, “Improved Support Vector Machine Enabled Radial Basis Function and Linear Variants for Remote Sensing Image Classification,” Sensors, vol. 21, no. 13, p. 4431, Jun. 2021.